Skip to content

feat: Add random state feature. #150

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: john-development
Choose a base branch
from

Conversation

john-halloran
Copy link

  • feat: Added random_state feature for reproducibility.

Copy link
Contributor

@sbillinge sbillinge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great!

We have to decide how much testing we will add. Ideal is 100% coverage, optimal is probably less.

Maybe write the docstrings so I can understand what the class does, then we can decide what to test?

components=None,
random_state=None,
):

self.MM = MM
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more descriptive name?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to n_components, which is what sklearn.decomposition.NMF uses.

MM,
Y0=None,
X0=None,
A=None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more descriptive name?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are many different standards for what to name these matrices. Zero agreement between sources that use NMF. I'm inclined to eventually use what sklearn.decomposition.non_negative_factorization uses, which would mean MM->X, X->W, Y->H. But I'd like to leave this as is for the moment until there's a consensus about what would be the most clear or standard. If people will be finding this tool from the sNMF paper, there's also an argument for using the X, Y, and A names because that was used there.

@@ -4,8 +4,20 @@


class SNMFOptimizer:
def __init__(self, MM, Y0=None, X0=None, A=None, rho=1e12, eta=610, max_iter=500, tol=5e-7, components=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need a docstring here and in the init. Please see scikit-package FAQ about how to write these. Also, look at Yucong's code or diffpy.utils?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added one here. The package init dates back to the old codebase, but as soon as that is updated it will get a docstring as well.

@@ -15,23 +27,22 @@ def __init__(self, MM, Y0=None, X0=None, A=None, rho=1e12, eta=610, max_iter=500
# Capture matrix dimensions
self.N, self.M = MM.shape
self.num_updates = 0
self.rng = np.random.default_rng(random_state)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have a more descriptive variable name? Is this a range? What is the range?

if self.A is None:
self.A = np.ones((self.K, self.M)) + np.random.randn(self.K, self.M) * 1e-3 # Small perturbation
self.A = np.ones((self.K, self.M)) + self.rng.normal(0, 1e-3, size=(self.K, self.M))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

K and M are probably good names if the matrix decomposition equation is in hte docstring, so they get defined there.

@john-halloran
Copy link
Author

This is great!

We have to decide how much testing we will add. Ideal is 100% coverage, optimal is probably less.

Maybe write the docstrings so I can understand what the class does, then we can decide what to test?

Thanks, will work on resolving these. To be clear, for things like the docstrings would you prefer I make new PRs, get those merged, then rebase this one, or just add to this existing PR?

@john-halloran
Copy link
Author

For now, I will assume anything given as feedback in this PR could be in scope to include.

Copy link
Contributor

@sbillinge sbillinge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great start. I left a couple of comments.

@@ -17,6 +17,64 @@ def __init__(
components=None,
random_state=None,
):
"""Run sNMF based on an ndarray, parameters, and either a number
Copy link
Contributor

@sbillinge sbillinge Jun 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fantastic! Thanks for this. Please see here for our docstring standards, I am not sure if you looked at it:
https://scikit-package.github.io/scikit-package/programming-guides/billinge-group-standards.html#docstrings

For classes it is a bit tricky because what info do we put in the "Class" docstring and what info do we put in the "constructor" (i.e., the __init__()) docstring. After some googling we came up with the breakout that is shown in the DiffractionObjects class that is shown there. We would be after something similar here.

By way of example, I would probably do like this in this case

def SNMFOptimizer:
       '''Configuration and methods to run the stretched NMF algorithm, sNMF

        Instantiating the SNMFOptimizer class runs all the analysis
        immediately. The results can then be accessed as instance attributes
        of the class (X, Y, and A). 

        Please see <reference to paper> for more details about the algorithm.

        Attributes
        -----------
         mm : ndarray
            The array containing the data to be decomposed. Shape is (length_of_signal, number_of_conditions)
        y0 : ndarray
            The array containing initial guesses for the component weights
            at each stretching condition.  Shape is (number_of_components, number_of_conditions
            ...
'''

put future development plans into issues, not in the docstring. Just describe the current behavior. Try and keep it brief but highly informational.

To conform to PEP8 standards I lower-cased the variables. I know they correspond to matrices but we should decide which standard to break. The tie-breaker should probably be scikit-learn. Whatever they do, let's do that. Let's also add a small comment (not in the docstring) to remind ourselves in the future if it breaks PEP8 or it will annoy me every time we revisit it and I will try and change it back......

Conditions on instantiation will go in the constructor docstring.

That one describes the init method so should look more like a function docstring. It would look something like....

    def __init__(mm....)
        '''Initialize a SNMFOptimizer instance and run the optimization
 
       Parameters
       ------------
            mm : ndarray
                The array containing the data to be decomposed. Shape is (length_of_signal, number_of_conditions)
           y0 : ndarray Optional.  Defaults to None
                The array containing initial guesses for the component weights
                at each stretching condition.  Shape is (number_of_components, number_of_conditions
        ...

I think there was some text before about how Y0 was required. But if it is required it may be better to make it a required (positional) variable in the constructor and not have it optional. we can discuss design decisions too if you like.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants